Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding tmate action to CI for debugging #3138

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

pshriwise
Copy link
Contributor

Description

This adds tmate to our CI in a mode that will only generate a connection if the CI action has failed. The connection will remain open for 15 minutes to allow someone with access to the action output to connect to the CI machine (once connected, the action will remain live until the user exits the terminal session).

Alternative options for deploying tmate are discussed in the issue below.

Fixes #3137

@ahnaf-tahmid-chowdhury
Copy link
Contributor

This could be helpful to investigate the PR #3087

@pshriwise
Copy link
Contributor Author

Okay I think I found a solution I'm happy enough with here, even if it's a little less elegant than I'd like. Contributors can now submit a commit with a message that contains [gha-debug] to their PR to produce a detached tmate session that allows all steps of the workflow to execute and then provides an SSH command to allow one to login and perform whatever problem solving is needed (default timeout is 10 min).

Github actions surprisingly doesn't make it very easy to get the latest commit information for all events that trigger workflows as part of the github.event context in the workflow file. The pull_request event requires a checkout of the repo with more depth so the merge commit parents can be extracted to get the correct message for determining whether or not the tmate session should be enabled.

@pshriwise
Copy link
Contributor Author

Just smoothed out one final snag here: The tmate action was causing jobs to fail when timing out, but I wasn't a fan of that. The continue-on-error flag for that step did the trick, so CI runs that trigger the tmate debug session with [gha-debug] in their commit messages should pass checks if the tests pass. This should keep PRs from requiring another CI session if the tests pass with the tmate session enabled.

Here's an example of this working in my fork: https://github.com/pshriwise/openmc/actions/runs/11335356157

I'll note that the errors do still appear in the summary, but aren't really of consequence.

continue-on-error: true
if: ${{ contains(env.COMMIT_MESSAGE, '[gha-debug]') }}
uses: mxschmitt/action-tmate@v3
timeout-minutes: 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it mean the session will end after 10 min? And I have noticed, when the log is huge, we can't scroll the window in tmate session. So, is there any alternative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does mean the session will close after 10 minutes, yes. I don't think we want CI running indefinitely. This seemed like a reasonable window for someone to follow the progress of CI and log into the session.

The terminal session is inside of a tool called tmux. You can scroll up higher in the terminal output, but it requires a few extra keystrokes (Ctrl+B, [ if memory serves).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please double check this? I was testing this workflow in my branch and found that the session gets closed even when I am logged in. I think that since we are using a specific commit to run this, we can increase the time limit to 1 h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm I hadn't experienced that. Let me confirm and I can increase the time limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Addition of entry point for debugging failed CI runs
2 participants